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Hi . 

! Abstract 

<N 

! ! Combine is an add-on to SigSpec and Cinderella. A SigSpec result file 

. or a file generated by Cinderella contains the significant sinusoidal signal 
\ components in a time series. In this file, Combine checks one frequency after 
the other for being a linear combination of previously examined frequencies. If 
this attempt fails, the corresponding frequency is considered "genuine". Only 
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i ■ genuine frequencies are used to form linear combinations subsequently. A purely 
heuristic model is employed to assign a reliability to each linear combination 
and to justify whether to consider a frequency genuine or a linear combination. 

1. What is Combine? 
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^ ' Combine performs an iterative analysis of the frequencies in a result.dat 
C*"} ! file generated by SigSpec (Reegen 2005, 2007, 2009) or one of the output 
q ' files generated by Cinderella (Reegen et al. 2008; Reegen 2009). The input 
. file type is detected automatically. 

If the attempt to interpret a given frequency as a linear combination fails, 
this frequency is considered genuine. Only genuine frequencies are used to 
' form linear combinations in the subsequent iterations. The decision whether 
7— I i to accept a linear combination is drawn using a mathematical model to assign 
' an equivalent spectral significance (hereafter abbreviated by 'sig') to a linear 
•i— i . combination. This equivalent sig is compared to the sig of the given signal 
' component, and only if it is high enough, the program adopts it. 

If there is more than one linear combination available, Combine picks the 
one with the highest equivalent significance. 

The underlying model leading to equivalent sigs and the reliabilities of linear 
combinations is purely heuristic and attempt to mimic the examination by an 
experienced person. 
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2. Input 



Combine is called by the command line 



combine <infile> 



where <inf ile> is the name (or path, if desired) of a SigSpec result file or 
an output file generated by Cinderella. 

Caution: Combine overwrites existing output files! 

Furthermore, the user may pass a set of specifications to Combine by means 
of a file <inf ile> . ini in the same folder as <inf ile>. For specifications not 
given by the user, defaults are used. 

The file <inf ile> . ini has to be terminated by a carriage- ret urn 
character, otherwise the program hangs! 



3. How Combine Works 

For a peak with given frequency and significance, all possible combinations of 
previously detected genuine frequencies k = l,...,K are computed. K is 
the maximum number of frequencies in a linear combination. The resulting 
frequency for a linear combination is 



and shall be compared to a frequency / in the input file. 

3.1. Sig vs. csig 

If the keyword csig is provided in the file <inf ile> . ini, the cumulative sig 
(Reegen 2007, 2009) is used instead of the sig. This keyword does not require 
any parameters. 

3.2. Frequency resolution 

The adjustment of the frequency resolution Sf is consistent with Eq. ?? in the 
Cinderella manual (Reegen 2009), where the total time interval width T 
has to be provided by the user, because the time series is not incorporated by 
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Combine. Moreover, the user is more flexible if allowed to specify a different 
value for T. This interval width is provided by means of the keyword dt in the 
file <inf ile> . ini, followed by a floating-point number. The default setting is 
that Combine determines the closest pair of frequencies and uses its inverse 
frequency spacing as T. 

The second parameter, r, is specified using the keyword tol, again followed 
by a floating-point number, in full consistency with Cinderella. The default 
value is r = 0, forcing Combine to employ the Rayleigh frequency resolution. 

The frequency tolerance permits linear combinations where 

a:=\f-f'\<Sf (2) 

only. The quantity a is the accuracy of a linear combination and provided in 
the output. 



3.3. Limit of harmonic order 

The range of harmonic orders is restricted by the parameter N, which is calcu- 
lated according to 

where sig fc denotes the sig associated to the frequency and sig^- is the sig 
associated to the last frequency in the input file, fx- If the keyword csig is set, 
the csig is consistently taken instead of the sig. The parameter il is provided 
by the keyword order in the file <inf ile> . ini, followed by a floating-point 
number. The default value is 1. Given the limit N, the coefficients of a linear 
combinations are restricted to indices from — N to N according to 

c k = -N,...,N (4) 



3.4. Equivalent sig 

Each linear combination is assigned an equivalent sig, 

sig cq := min (\c k \~ Sk sig fc ) - X log if , (5) 

where 5 k denotes the decay parameter provided by the keyword decay, and \ ' s 
the combination damping, specified using the keyword cdamp. Both keywords 
are followed by floating-point numbers. The default values for both parameters 
are 1. 
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Figure 1: Ratio of equivalent sig over sig of an individual signal component vs. poly- 
nomial coefficient Ck associated to the signal component. Five graphs for different 
values of the decay parameter 8 k are presented. 



Fig. 1 displays the relative sig correction with increasing coefficient c& for 
five different values of the decay parameter 5k- Fig. 2 illustrates the correction 
of equivalent sig with increasing number of components contributing to a linear 
combination K for five different values of the combination damping \. 

3.5. Reliability and sensitivity 

A linear combination is only accepted if the equivalent sig of the combination 
is high enough compared to the significance of the given peak according to 

R:= S 3^>s, (6) 
sig/ 

where S is the sensitivity, which can be adjusted by means of the keyword 
sens in the file <inf ile> . ini. The keyword is followed by a floating-point 
number, and the default value is 0.1. If all examined linear combinations have 
a reliability below S, the examined signal component is considered genuine. 
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Figure 2: Additive significance correction for a linear combination employing K differ- 
ent signal components. Five graphs for different values of the combination damping 
X are presented. 



Hence the sensitivity provided by the keyword sens permits to directly adjust 
the number of genuine components in a list of frequencies. 

The ratio of sigs, R, is called the reliability of a linear combination and part 
of the Combine output. If multiple combinations are available, the reliability is 
used to decide which one to pick. This means, Combine picks the combination 
with the highest reliability. 



4. Output 

Genuine frequencies are assigned identifiers f #index#, where #index# denotes 
an integer number starting at 1. According to the number of significant signals 
present in the file <inf ile>, Combine chooses a constant number of digits. 
For example, if the input file contains from 1 to 9 frequencies, the identifiers 
for genuine frequencies are f 1, f 2, ... If the input file contains from 10 to 99 
frequencies, Combine enumerates the genuine components f 01, f 02 and 
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so on. This format convention applies to the indexing of rows also. 

Linear combinations are denoted by the frequency identifiers of the gen- 
uine components and appear as a formula: if the frequency under consid- 
eration is, e.g., fi + 3/ 3 - 2/io - /14 - 0.00214, Combine displays it as 
f 01+3f 02-2f 10-f 14-0.00214 both on the screen and in the output file. In 
this context, -0.00214 is the frequency accuracy. 

The screen output consists of a single line for each signal (i. e., for each row 
in the input file). Combine displays 

1. the row index, 

2. the linear combination including the frequency accuracy, and 

3. the reliability R (Eq. 6). 

For genuine frequencies, Combine displays only the row index and the fre- 
quency identifier. At runtime, the most reliable linear combination identified so 
far is displayed. If Combine finds a "better" solution, the line on the screen 
is updated. 

By default, Combine generates an output file <inf ile> . cmb. It contains 
a row index in the first column, then all information of the input file in the 
further columns, plus three additional columns at the end: 

1. reliability R (Eq.6) 1 , 

2. total number of linear combinations within the frequency resolution, 

3. the linear combination itself, plus the frequency accuracy. If a frequency 
is considered genuine, only the frequency identifier is displayed. 

For convenience, a second output file <inf ile> . gen is produced by Com- 
bine. It is truncated to the genuine frequencies only and contains the row 
index in the first column, then all the information provided in the input file, 
plus the frequency identifier in the last column. The columns for the reliabil- 
ity and the number of linear combinations within the frequency resolution are 
omitted. This file provides the opportunity to have all the genuine frequencies 
available at a glance. 

Example. 2 The sample project CombineNative contains a list of significant 
frequencies found in the MOST 3 (Microvariability & Oscillations of STars) 

1 Zero values indicate genuine frequencies 

2 The computation of the sample project CombineNative takes 40 minutes on an Intel 
Core2 CPU T5500 (1.66GHz) under Linux 2.6.18.8-0.9-dcfault i686. 

3 MOST is a Canadian Space Agency mission, jointly operated by Dynacon Inc., the 
University of Toronto Institute of Aerospace Studies, the University of British Columbia, 
and with the assistance of the University of Vienna, Austria. 
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photometry of ( Oph (Walker et al. 2003, 2001 2005). According to the 
input file result . dat, altogether 294 formally significant signal components 
(sig > 5) were identified. 

The file result . dat . ini contains five keywords: 

order 0.2 
dt 26 
decay 1 . 5 
cdamp 10 
sens 0.2 



The dataset is 26 days long, and the frequencies are provided in cycles per 
day. Thus Combine will assume a Rayleigh frequency resolution of 0.03846 
cycles per day. There is no specification for the frequency tolerance param- 
eter (keyword tol). Thus the default setting is used. 

Running Combine by typing the command line Combine result.dat 
yields a welcome message on the screen. 



cccccc 
cc cc 



CC ooooo m mm mm 

CC oo oo mm mm mm 

CC oo oo mm mm mm 

CC oo oo mm mm mm 

CC oo oo mm mm mm 

CC CC oo oo mm mm mm 
CCCCCC ooooo mm mm mm 



bb ii 
bb 

bb bbb ii n nnnn eeeee 

bbb bb ii nn nn ee ee 

bb bb ii nn nn ee ee 

bb bb ii nn nn eeeeeee 

bb bb ii nn nn ee 

bb bb ii nn nn ee ee 

b bbbb ii nn nn eeeee 



Version 1.0 

************************************************ 

by Piet Reegen 

Institute of Astronomy 

University of Vienna 

Tuerkenschanzstrasse 17 

1180 Vienna, Austria 

Release date: August 18, 2009 



The program finds out that the input file is a seven-column SigSpec 
result file, determines the number of rows and reads the input data. Note 
that 295 rows correspond to 294 significant signal components, because the 
last row in the SigSpec result file contains information on the residuals 
(see SigSpec manual, p. 11). 

*** start ************************************* 

File result.dat: SigSpec format 

rows 295 

read input file 



Then the search for linear combinations starts. For each row in the input 
file, COMBINE displays the most reliable combination detected so far. 
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The first four signal components are found to be genuine. Since the 
number of signal components is 294, Combine uses a three-digit format for 
the row indices and frequency identifiers. 

row 001: fOOl 
row 002: f002 
row 003: f003 
row 004: f004 

For rows 5 and 6 in the input data, the screen output contains the most 
reliable linear combination (including the frequency accuracy) and the reli- 
ability. 

row 005: 3f 001-f 002-2f 003-f 004+0 . 0284306 0.236585 
row 006: 3f 001+2f 002-f 004+0 . 0136421 0.35803 

An examination of the output file result . dat . cmb shows that rows 005 
and 006 end with 

0.2365853347754522 1 3f 001-f 002-2f 003-f 004+0 . 0284306168856169 
0.3580304203945811 2 3f001+2f 002-f 004+0 . 0136420746028509 

These entries refer to the columns added by Combine. The first value is the 
reliability, the second one is the number of examined linear combinations, 
and the last column represents the linear combination itself. For row 005, 
there is only one linear combination available within the frequency resolu- 
tion, for row 006 the number of linear combinations taken into account is 
2. 

Subsequently, the screen output indicates a fifth genuine frequency. 

row 007: f005 

The frequency in row number 8 is 0.02783 cycles per day, which is below 
the frequency resolution. Thus the component is considered to refer to zero 
frequency, and in this case, no reliability is evaluated. 

row 008: 0+0.0278395 

In the further rows of the input files, no more genuine frequencies are 



detected. 




row 


009: 


-f002+f 005-0. 025485 0.759005 




row 


010: 


f 001-f 002-f 004+f 005+0. 0313392 


.490535 


row 


Oil: 


-fOOl+f 004-0. 00275538 1.26888 




row 


012: 


f 001-f 002-f 004+f 005-0. 0295542 


.680494 


row 


013: 


-2f001+2f003+f 004-0. 00567519 0. 


523911 


row 


014: 


-fOOl+f 005+0. 024731 1.72772 




row 


015: 


2f 002+0. 0249392 1.47442 




row 


016: 


2f 001-f 004-0. 0100088 1.70761 




row 


017: 


-f001+2f 002-0. 00217389 1.55951 




row 


018: 


f 001-f 002+0. 00824894 3.95466 




row 


019: 


f002+f 005-0. 00668728 1.64167 




row 


020: 


2f 002+f 003-f 005-0 . 00199182 . 779607 
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It is a remarkable matter of fact that COMBINE is able to compose all 
294 frequencies contained by the input file as linear combinations of no 
more than five genuine frequencies. However, a different parameter con- 
stellation in the configuration file result . dat . ini can produce completely 
different output. Note that the time consumption by Combine dramatically 
increases with the number of genuine frequencies identified. This is because 
more genuine frequencies increase the number of possible linear combina- 
tions over-proportionally. A list of genuine frequencies only is found in the 
output file result . dat . gen. 

5 genuine frequencies found. 
Finished. 

+ * + * + 

Thank you for using Combine! 
Questions or comments? 

Please contact Piet Reegen (reegen@astro . univie . ac . at) 
Bye! 

5. Order of Input Rows 

Since Combine processes the input file row by row, the order of rows plays 
a crucial part in the way the analysis is performed. Changing the order of 
rows in the input file influences the base upon which the linear combinations 
are formed. Thus, if there are frequencies previously known to be genuine, 
it is advisable to ensure that they are on top of the input file, if all further 
frequencies are supposed to be checked for linear combinations of preferrably 
these components. 

Example. 4 The input of the sample project order is essentially the same 
as for CombineNative. Only the order of rows is slightly modified: the 
6th signal component of the file result .dat in the project CombineNative, 
which refers to the orbit frequency of the MOST spacecraft, appears now 
on top. This re- ordering forces COMBINE to consider 14.188 d -1 genuine. 
Also the configuration file result.dat.ini is the same as for the project 
CombineNative. 

Again, there is a base of five genuine frequencies three of which are iden- 
tical to the project CombineNative, namely 5.182, 2.675 and 3.055 cycles 
per day. The two genuine signal components at 6.722 and 7.193 cycles per 
day are replaced by 14-188 and 0.0697 cycles per day. 



4 The computation of the sample project order takes 40 minutes on an Intel Core2 
CPU T5500 (1.66GHz) under Linux 2.6.18.8-0. 9-default i686. 
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6. Rejecting Unwanted Linear Combinations 

Moreover, the user may indicate unwanted signal components in the input file 
<inf ile> by applying a minus sign to the corresponding frequencies. Combine 
reacts with a corresponding change of the sign for the reliability. If the user 
additionally provides the keyword reject in the file <inf ile> . ini, all rows 
are rejected from the output file <inf ile> . cmb for which the most reliable 
linear combination contains one or more unwanted frequencies. 

The screen output contains linear combinations incorporating unwanted fre- 
quencies at runtime. To indicate such unwanted combinations, the reliability 
is displayed as a negative value. If the examination of an input line finishes 
with the "best" linear combination containing an unwanted frequency, the cor- 
responding line is removed from the screen output. 

Example. 5 The input of the sample project reject is the same as for 
order, with a minus sign for the first frequency of 14-188 cycles per day, 
which represents the orbit of the MOST spacecraft. The file result . dat . ini 
contains an additional line, 

reject 

The combination of this keyword and the negative sign for the first signal 
component in the input file forces Combine to reject all linear combina- 
tions incorporating the frequency 14-188 cycles per day from the output file 
result .dat . cmb. In the screen output, such linear combinations are indi- 
cated by a negative reliability, e. g. 

row 005: f001+3f002+2f 003+0 . 0136421 -0.325575 

This entry is visible at runtime, but vanishes from the screen output 
when the calculations for row 006 start. 

7. Keywords Reference 

This section is a compilation of all keywords accepted by Combine. A brief de- 
scription of arguments and default values is given. If an argument is required, it 
is indicated by <double>, and default values are given in parentheses, e.g. (1) . 

cdamp <double> (1) 

combination damping, e. g. reduction of reliability of a linear combination with 
increasing number of components employed, p. 3 

5 The computation of the sample project reject takes 40 minutes on an Intel Core2 
CPU T5500 (1.66GHz) under Linux 2.6.18.8-0.9-dcfault i686. 
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csig 

forces Combine to use csig instead of sig, p. 2 
decay <double> (1) 

decay of reliability assigned to a frequency multiple for increasing harmonic 
order, p. 3 

dt <double> (auto) 

total time interval of the time series, defining the Rayleigh frequency resolu- 
tion. By default, Combine determines the Rayleigh frequency resolution as 
the frequency spacing of the closest pair of frequencies found in the input data, 
p. 3. 

order <double> (auto) 

parameter restricting the range of harmonics of individual frequency components 
to be employed to form linear combinations, p. 3 

reject 

activates the rejection of unwanted linear combinations. The user may indi- 
cate unwanted frequencies by a minus sign in the input file <infile>. If this 
keyword is set, Combine automatically suppresses the output of those signal 
components for which the most reliable linear combination incorporates such 
an unwanted frequency, p. 10. 

sens (0.1) 

reliability limit to be exceeded in order to accept a linear combination, adjusts 
the number of genuine components in a frequency list, p. 4 

tol <double> (0) 

Combine frequency tolerance parameter, p. 3 
8. Online availability 

The ANSI-C code is available online at http: //www. sigspec . org. For further 
information, please contact P. Reegen, peter . reegenOunivie . ac . at. 
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